Virtual machine memory snapshots in persistent memory

ABSTRACT

Various embodiments set forth techniques for taking a snapshot of virtual memory of a virtual machine. One technique includes allocating, in a persistent memory, one or more blocks associated with a virtual memory, annotating a first portion of the virtual memory for copying in a first pass, copying the first portion into the one or more blocks in the persistent memory in the first pass, receiving a write request associated with the first portion, and in response to receiving the write request: applying the write request to the first portion and annotating the first portion for copying in a second pass subsequent to the first pass.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.63/174,222, titled “VIRTUAL MACHINE MEMORY SNAPSHOTS IN PERSISTENTMEMORY,” and filed on Apr. 13, 2021. The subject matter of this relatedapplication is incorporated by reference in its entirety.

TECHNICAL FIELD

The contemplated embodiments relate generally to management of a virtualmachine in a computing system and, more specifically, to virtual machinememory snapshots in persistent memory.

BACKGROUND

Virtualization is an important feature in modern computing systems, suchas enterprise-level computing systems. By creating a virtual version ofa once-physical item, various applications and operating systems can beabstracted away from the hardware and/or software underneath. A popularaspect of virtualization is a virtual machine (VM), which emulates acomputer system and runs on top of another system. A VM can have its ownvirtual memory and may have access to any number of computing resources,including physical memory, secondary storage, networks, input/outputdevices, and/or the like, via a hypervisor.

As part of the operation and management of a VM, snapshots of the VM maybe taken. Snapshots of a VM record the states of components of the VM(e.g., virtual devices, virtual disk, virtual memory) at a given time,which can be used for various purposes (e.g., restoring to a certainstate after a crash). A typical approach to taking a snapshot of a VMincludes pausing the VM and/or suspending access to virtual storage ormemory within the VM, saving the states to physical persistent storage,and then un-pausing the VM and/or un-suspending the access to virtualstorage or memory.

A drawback of this approach to taking snapshots of the VM is that suchan approach can have a great impact on the performance of the VM, inparticular the virtual memory of the VM. Because the virtual memory caninclude a large amount of data to snapshot, on the order of gigabytes orterabytes in some cases, and access to virtual memory is verylatency-sensitive, any pause of the VM and/or suspension of access couldcause delays in many operations of the VM, thus greatly impacting theperformance of the VM as a whole. Such delays are very much undesirable,especially when the VM is deployed in a system that handles manytime-sensitive operations.

Accordingly, there is need for improved techniques for taking a snapshotof the virtual memory of a virtual machine.

SUMMARY

Various embodiments set forth one or more non-transitorycomputer-readable media storing program instructions that, when executedby one or more processors, cause the one or more processors to performsteps of for taking a snapshot of virtual memory of a virtual machine.The steps include allocating, in a persistent memory, one or more blocksassociated with a virtual memory; detecting a write request associatedwith a first portion of the virtual memory; in response to detecting thewrite request associated with the first portion, prioritizing the firstportion; based on the prioritizing, copying the first portion into theone or more blocks in the persistent memory ahead of a second portion ofthe virtual memory; and after copying the first portion: applying thewrite request to the first portion; and copying the second portion intothe one or more blocks in the persistent memory.

Various embodiments set forth one or more non-transitorycomputer-readable media storing program instructions that, when executedby one or more processors, cause the one or more processors to performsteps of for taking a snapshot of virtual memory of a virtual machine.The steps include allocating, in a persistent memory, one or more blocksassociated with a virtual memory; annotating a first portion of thevirtual memory for copying in a first pass; copying the first portioninto the one or more blocks in the persistent memory in the first pass;receiving a write request associated with the first portion; and inresponse to receiving the write request: applying the write request tothe first portion; and annotating the first portion for copying in asecond pass subsequent to the first pass.

Other embodiments include, without limitation, methods and systems thatimplement one or more aspects of the disclosed techniques, and one ormore computer readable media including instructions for performing oneor more aspects of the disclosed techniques.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features of the variousembodiments can be understood in detail, a more particular descriptionof the inventive concepts, briefly summarized above, may be had byreference to various embodiments, some of which are illustrated in theappended drawings. It is to be noted, however, that the appendeddrawings illustrate only typical embodiments of the inventive conceptsand are therefore not to be considered limiting of scope in any way, andthat there are other equally effective embodiments.

FIG. 1 is a block diagram illustrating a virtual computing environmentaccording to various embodiments of the present disclosure.

FIG. 2 is a block diagram illustrating a snapshot operation in thevirtual computing environment of FIG. 1 according to various embodimentsof the present disclosure.

FIG. 3 is a flow chart of method steps for taking a snapshot of avirtual memory of a virtual machine, according to various embodiments ofthe present disclosure.

FIGS. 4A-4E are diagrams illustrating an example snapshot progressinformation associated with a snapshot operation, according to variousembodiments of the present disclosure.

FIGS. 5A-5B is another flow chart of method steps for taking a snapshotof a virtual memory of a virtual machine, according to variousembodiments of the present disclosure.

FIGS. 6A-6D are diagrams illustrating another example snapshot progressinformation associated with a snapshot operation, according to variousembodiments of the present disclosure.

FIG. 7 is yet another flow chart of method steps for taking a snapshotof a virtual memory of a virtual machine, according to variousembodiments of the present disclosure.

FIGS. 8A-8D are diagrams illustrating yet another example snapshotprogress information associated with a snapshot operation, according tovarious embodiments of the present disclosure.

FIGS. 9A-9B is a further flow chart of method steps for taking asnapshot of a virtual memory of a virtual machine, according to variousembodiments of the present disclosure.

FIGS. 10A-10D are block diagrams illustrating virtualization systemarchitectures configured to implement one or more aspects of the presentdisclosure.

FIG. 11 is a block diagram illustrating a computer system configured toimplement one or more aspects of the present disclosure.

For clarity, identical reference numbers have been used, whereapplicable, to designate identical elements that are common betweenfigures. It is contemplated that features of one embodiment may beincorporated in other embodiments without further recitation.

DETAILED DESCRIPTION

In the following description, numerous specific details are set forth toprovide a more thorough understanding of the various embodiments.However, it will be apparent to one of skilled in the art that theinventive concepts may be practiced without one or more of thesespecific details.

FIG. 1 is a block diagram illustrating a virtual computing environment100 according to various embodiments of the present disclosure. As shownin FIG. 1, virtual computing environment 100 is built around a virtualmachine 120. In some embodiments, virtualized computing environment 100and/or virtual machine 120 may be implemented in a cloud computingsystem, such as a public cloud, a private cloud, or a hybrid cloud thatincludes a combination of an on-premise data center and a public cloud,a private cloud, and/or the like. In various embodiments, virtualmachine 120 includes a collection of software instructions that thatserve to abstract details of underlying hardware or software componentsfrom one or more higher-level processing entities. Virtual machine 120can serve as virtualization and/or emulation of a computer system on topof a physical computer system (e.g., emulate one platform on top ofphysical hardware running a different platform). Examples of underlyinghardware and/or software that can be abstracted by virtual machine 120include, for example and without limitation, memory (e.g., volatilememory), storage (e.g., disk or other non-volatile storage), one or moredevices (e.g., I/O devices, etc.), an operating system (e.g., guestoperation system), and one or more applications. As shown, virtualmachine 120 includes a virtual memory 122, a virtual disk 124, and oneor more virtual devices 126. Although virtual machines, such as virtualmachine 120, are described in further detail below as a referenceexample, techniques disclosed herein can also be applied to other typesof virtual computing environments, such as containers, that provideisolated computing environments. Containers can be implemented via anabstraction layer that executes on top of the kernel of an operatingsystem (OS) in a node (or a VM) and provides OS-level virtualization inwhich each container runs as an isolated process on the OS.

Virtual computing environment 100 includes physical hardware that hostsvirtual machine 120 and on which virtual machine 120 runs. The physicalhardware can include, for example, one or more processing units 160, aprimary memory 110, a persistent memory 140, and storage 150.

Primary memory 110 can include volatile media. Examples of volatilemedia include dynamic memory such as random access memory (RAM), anddynamic random access memory (DRAM).

Storage 150 can include non-volatile storage media. Examples ofnon-volatile storage media include solid state storage devices (SSDs),optical or magnetic disks such as hard disk drives (HDDs), and/or hybriddisk drives, or optical or magnetic media drives.

Persistent memory 140 can include non-volatile random-access memory.Persistent memory 140 has characteristics of random-access memory (e.g.,allows random access), but can retain data across power cycles (e.g.,data persists when power is turned off, data persists across reboots).Furthermore, persistent memory 140 is byte-addressable. An example ofpersistent memory 140 is the INTEL® OPTANE™ PERSISTENT MEMORY by IntelCorporation. In some embodiments, persistent memory 140 can operate in amemory mode, an “AppDirect” mode, or in a dual mode. In the memory mode,persistent memory 140 operates like volatile memory (e.g., isbyte-addressable), and accordingly can be used as additional primarymemory 110. In the AppDirect mode, persistent memory 140 operates withdata persistence (can retain data across power cycles, as describedabove) and is byte-addressable. In the dual mode, a portion ofpersistent memory 140 operates in memory mode, and the remainderoperates in AppDirect mode. Another example of persistent memory 140 isa non-volatile dual inline memory module (NVDIMM).

Processing unit(s) 160 include any suitable processors implemented as acentral processing unit (CPU), a graphics processing unit (GPU), anapplication-specific integrated circuit (ASIC), a field programmablegate array (FPGA), an artificial intelligence (AI) accelerator, anyother type of processor, or a combination of different processors, suchas a CPU configured to operate in conjunction with a GPU. In general,the one or more processing units 160 may be any technically feasiblehardware unit capable of processing data and/or executing softwareapplications.

Virtual computing environment 100 also includes a hypervisor 130.Hypervisor 130 is host software, running on the physical hardware, thatcan manage execution of and operations on virtual machine 120, and canserve as an intermediary between virtual machine 120 and the physicalhardware hosting virtual machine 120. In some embodiments, hypervisor130 includes emulator 134 (e.g., a virtual machine monitor), which canallocate space in primary memory 110 to store virtual memory 122 andperform various other functions associated with execution of virtualmachine 120.

In some embodiments, a snapshotting application 132 runs on the physicalhardware. Snapshotting application 132 is configured to performsnapshots of virtual machine 120 and/or components of virtual machine120 (e.g., virtual memory 122, virtual disk 124, virtual device(s) 126).As shown, in some embodiments, snapshotting application 132 can be acomponent application or module of hypervisor 130, but in some otherembodiments, snapshotting application 132 can be an application distinctfrom hypervisor 130 and running on the physical hardware. In someembodiments, hypervisor 130, snapshotting application 132, and emulator134 are loaded into primary memory 110 and executed by processingunit(s) 160 from primary memory 110.

Virtual machine software 128 can run on virtual machine 120. Virtualmachine software 128 can include any software configured to run onwhatever hardware and/or software platform that is being emulated orvirtualized by virtual machine 120. In some embodiments, virtual machinesoftware 128 includes a guest operating system (e.g., an operatingsystem associated with the platform being emulated or virtualized byvirtual machine 120).

In some embodiments, virtual computing environment 100 includes addresstranslation for translating locations (e.g., addresses) in virtualmemory 122 and/or virtual disk 124 to locations (e.g., addresses) inprimary memory 110, persistent memory 140, and/or storage 150. Theaddress translation can be performed using any technically feasibletechnique, including, for example and without limitation, extended pagetables (EPT). In some embodiments, the address translation enablesvirtual machine software 128, which is running on virtual machine 120that is running via processing unit(s) 160 to directly access theportion of primary memory 110 allocated to virtual memory 122 withoutnecessarily going through hypervisor 130.

Virtual memory 122 can have associated permissions. For example, virtualmemory 122 can have read-only or read/write permission. In variousembodiments, virtual memory 122 can have per-portion permissions (e.g.,per page, per block). For example, a portion of virtual memory 122 couldbe read-only and the remainder could have read/write permission. In someembodiments, permissions can be enforced at the address translationlevel. For example, extended page tables can include permissions perpage.

In various embodiments, a snapshot of one or more components of virtualmachine 120 can be taken. A snapshot captures the state(s) and/or dataof the one or more components at a given time. Snapshot(s) can be takenfor virtual memory 122, virtual disk 124, and/or virtual device(s) 126.Snapshots can be stored persistently (e.g., in storage 150) for laterretrieval (e.g., for crash recovery or virtual machine cloningpurposes). A conventional approach to taking a snapshot includes copyingdata contents in primary memory 110 that correspond to data in virtualmemory 122 to storage 150.

A drawback of the above-described conventional approach to taking asnapshot is that the approach requires pausing virtual machine 120 totake the snapshot. Pausing virtual machine 120 causes a delay inoperation of virtual machine 120. The delay can have a great impact onthe performance of virtual machine 120, especially if virtual memory 122is large (e.g., on the order of gigabytes or terabytes).

To address this and other drawbacks of taking a snapshot of a virtualmachine, techniques for taking a snapshot to a persistent memory (e.g.,persistent memory 140) are disclosed herein. Because copying data topersistent memory 140 is quicker than copying to storage 150, a snapshotcan be taken with little or no pausing of virtual machine 120.Accordingly, taking a snapshot using the disclosed techniques can have areduced impact on the performance of virtual machine 120.

FIG. 2 is a block diagram illustrating a snapshot operation in virtualcomputing environment 100 according to various embodiments of thepresent disclosure. As shown, emulator 134 in hypervisor 130 cangenerate a VM allocation 236, which allocates a portion of primarymemory 110, primary memory (PM) portion 212, to serve as virtual memory122 of virtual machine 120. Emulator 134 generates and tracks VMallocation 236. In some embodiments, emulator 134 can also generate anextended page table (EPT) 270 for translating addresses between virtualmemory 122 and PM portion 212. When an instruction to access virtualmemory 122 (e.g., a read) is issued by virtual machine software 128, theportions of PM portion 212 relevant to the instruction can be accesseddirectly using one or more addresses translated by EPT 270.

As described above, a snapshot of virtual machine 120, including virtualmemory 122, can be taken. To take a snapshot of virtual memory 122, asnapshot of PM portion 212 is taken by copying the contents of PMportion 212 into persistent memory 140 as virtual memory snapshot 242,using any of the techniques described below. Virtual memory snapshot 242can subsequently be copied or moved into storage 150 as virtual memorysnapshot copy 252.

In various embodiments, processing unit(s) 160 can concurrently runemulator 134, and thereby run virtual machine 120, and take a snapshotof PM portion 212. For example, if processing unit(s) 160 includes atleast two central processing units (CPUs), one CPU can run emulator 134and another emulator can take a snapshot of PM portion 212 (e.g.,running snapshotting application 132 to take the snapshot). When virtualmachine 120 is run concurrently with the taking of a snapshot of PMportion 212, snapshotting application 132 and/or emulator 134 canimplement one or more techniques to ensure consistency of the snapshotagainst data write instructions that are received during thesnapshotting process, the details of which are described below.

FIG. 3 is a flow chart of method steps for taking a snapshot of avirtual memory of a virtual machine, according to various embodiments ofthe present disclosure. Although the method steps are described withrespect to the systems of FIGS. 1-2 and FIGS. 10A-11, persons skilled inthe art will understand that any system configured to perform the methodsteps, in any order, falls within the scope of the various embodiments.

As shown, a method 300 begins at step 302, where a snapshottingapplication 132 allocates one or more blocks in persistent memory 140.The allocated block(s) are associated with virtual memory 122 of virtualmachine 120, and correspondingly, associated with PM portion 212allocated to virtual memory 122. In some embodiments, the allocatedblock(s) correspond to space in persistent memory 140 allocated for avirtual memory snapshot 242.

At step 304, the snapshotting application pauses virtual machine 120.Snapshotting application 132 pauses the execution, and also freezes thedata and states, of virtual machine 120 (and accordingly also of virtualmachine software 128). For example, snapshotting application 132 cansignal emulator 134 to pause execution of virtual machine 120.Accordingly, the data of virtual memory 122 in PM portion 212 is frozenwhile virtual machine 120 are paused. While virtual machine 120 ispaused, operations associated with virtual machine 120 (e.g., attemptsto write to virtual memory 122) can be held back (e.g., trapped)hypervisor 130 (e.g., queued by emulator 134). In some embodiments,operations that do not affect the data contents of virtual memory 122and/or the states of virtual machine 120 (e.g., reads) need not be heldback and can be executed even while virtual machine 120 is paused.

At step 306, the snapshotting application copies the virtual memory tothe one or more blocks in the persistent memory. Snapshottingapplication 132 copies the contents of virtual memory 122, as stored inPM portion 212, into persistent memory 140 (e.g., into the block(s) inpersistent memory 140 allocated for virtual memory snapshot 242 in step302). For example, snapshotting application 132 can copy PM portion 212into virtual memory snapshot 242 in persistent memory 140. When thecopying in step 306 is completed, a full copy of PM Portion 212 iscopied to the allocated blocks in persistent memory 140 as virtualmemory snapshot 242.

At step 308, the snapshotting application resumes the virtual machine.After a snapshot copy of PM portion 212 is made as virtual memorysnapshot 242, snapshotting application 132 can restart virtual machine120 and/or unfreeze the data and states of virtual machine 120. Thenmethod 300 can end. Any operations that were held back because of thepausing of virtual machine 120 can be applied after the resumption ofvirtual machine 120. Further, method 300 can be performed again to takea new snapshot of virtual memory 122.

In some embodiments, after a full copy of PM portion 212 is copied topersistent memory 140 as virtual memory snapshot 242 in step 306,snapshotting application 132 can copy or move virtual memory snapshot242 in persistent memory 140 into storage 150 as virtual memory snapshotcopy 252. After virtual memory snapshot 242 is copied or moved intostorage 150 as virtual memory snapshot copy 252, snapshottingapplication 132 can deallocate the allocated blocks in persistent memory140 holding virtual memory snapshot 242. Those deallocated blocks canthen be reallocated (e.g., to a subsequent virtual memory snapshot 242).Method 300 can be performed again to take a new snapshot of virtualmemory 122.

FIGS. 4A-4E are diagrams illustrating example snapshot progressinformation associated with a snapshot operation, according to variousembodiments of the present disclosure. In some embodiments, techniquesfor taking a snapshot of virtual memory 122 forgoes pausing virtualmachine 120, as in method 300. One such technique is illustrated viaexample snapshot progress information illustrated in FIGS. 4A-4E.

In FIG. 4A, a table 400 includes columns 402 and 404. Column 402indicates identifiers of addressable unit portions (e.g., addressableblocks, addressable pages, memory addresses) of virtual memory 122 (or,more particularly, identifiers of unit portions of PM portion 212serving as virtual memory 122). As shown, virtual memory 122 includes Nblocks, numbered from 0 to N−1 in FIGS. 4A-4E for simplicity and ease ofunderstanding. In various implementations, the identifiers can be theaddresses of the unit portions. Column 404 indicates a snapshot statusof each block of virtual memory 122. In some examples, the copied/notcopied status of a block in column 404 can be recorded using a singlebit. The snapshot status of a block indicates whether the block has beencopied into an outstanding virtual memory snapshot 242 in persistentmemory 140.

FIG. 4A also illustrates a table 470 corresponding to a subset of thecontents (in particular, a subset of the columns or fields) of an EPT.Table 470 includes a column 472 indicating the identifiers of the blocksof virtual memory 122 and a column 474 indicating the permissions foreach block of virtual memory 122. In various embodiments, columns 472and 474 are columns or fields in an EPT (e.g., EPT 270) among otheradditional data, columns, and/or fields (e.g., physical address atprimary memory 110, translation mapping from virtual address to physicaladdress, etc.). In some embodiments, table 470 corresponds to a portionof EPT 270. For simplicity and ease of understanding, the descriptionbelow assumes two possible permissions: read-write (“R/W”; the block canbe read or written/modified) or read-only (the block can be read but notwritten to or otherwise modified). In various implementations, thepermissions can be more complex (e.g., read-only, read-write, read-writebut no delete, read-only and executable, etc.).

FIG. 4A further illustrates a normal queue 420 and a priority queue 430,which indicate the order of copying of blocks to an outstanding virtualmemory snapshot 242. Both queues are shown as empty in FIG. 4A. Normalqueue 420 and a priority queue 430 are further described below.

FIG. 4A corresponds to a state in virtual computing environment 100prior to a snapshot operation (e.g., between snapshot operations). Thatis, a previous virtual memory snapshot 242 had already been completed,and a new virtual memory snapshot 242 has not been started. Accordingly,the blocks have a not-copied status as shown in table 400; none of theblocks have been copied to a new virtual memory snapshot 242 yet. Also,the blocks have their current permissions prior to a new snapshot, whichin this case are read-write permissions as shown in table 470. Ofcourse, in various implementations, different blocks can have differentpermissions prior to the new snapshot (e.g., some block have read-writepermission and others have read-only permission).

FIG. 4B corresponds to a state in virtual computing environment 100 whensnapshotting application 132 initiates a snapshot operation to take anew virtual memory snapshot 242 of virtual memory 122. In preparationfor taking the new virtual memory snapshot 242, snapshotting application132 can change the permissions for the blocks of virtual memory 122 intable 470 to read-only. In some embodiments, the permissions in effectbefore the change in permissions (e.g., the permissions as shown in FIG.4A) can be saved and stored in a memory or storage (e.g., primary memory110, persistent memory 140, storage 150, another column of table 400(not shown)) for later restoration. Accordingly, table 470 shown in FIG.4B shows blocks 0 thru N−1 as having read-only permissions. Also,snapshotting application 132 can enqueue the identifiers of blocks 0thru N−1 into normal queue 420. While the identifiers for the blocks areenqueued into normal queue 420 in ascending numerical order as shown, itshould be appreciated that the identifiers for the blocks can beenqueued into normal queue 420 in any suitable order (e.g., randomorder, a predefined order, in ascending address order, in descendingaddress order). Further, the permissions can be changed to any suitablepermission that prohibits modification to the block (e.g., read-only, amore complex permission that prohibits writes and other modifications tothe block) during the snapshot operation.

After the identifiers for the N blocks are enqueued and theirpermissions changed to read-only, snapshotting application 132 canproceed with the snapshot operation by copying virtual memory 122block-by-block to the new virtual memory snapshot 242. Snapshottingapplication 132 can copy the blocks in the order of the identifiers forthe blocks in normal queue 420. For example, snapshotting application132 can dequeue the identifier for a block from the head of normal queue420 and copy the block corresponding to the dequeued identifier tovirtual memory snapshot 242. Snapshotting application 132 then restoresthe permission of the copied block to the permissions in effect prior tothe snapshot operation.

FIG. 4C illustrates a state in virtual computing environment 100 duringthe snapshot operation. As shown, blocks 0 thru 105 have been copied tothe outstanding virtual memory snapshot 242. Accordingly, the identifiesfor those blocks are no longer in normal queue 420, their permissionshave been restored to read-write in table 470, and their snapshot statuscolumn 404 are shown as copied.

Because the virtual machine 120 is not paused, during the snapshotoperation, an operation to modify (e.g., to write to) a block can bereceived by hypervisor 130. For example, emulator 134 can receive awrite request or operation, issued by virtual machine software 128, fora block in virtual memory, and that block either has been copied or isnot yet copied to the outstanding virtual memory snapshot 242. If thewrite request or operation is for a block that has been already copiedto the outstanding virtual memory snapshot 242, then the request oroperation can be applied according to the current permission of theblock (e.g., the permission as indicated in table 470). For example, awrite request to write to blocks 0-3 can be applied as normal accordingto the permissions of blocks 0-3 indicated in table 470.

If the write request or operation is for a block that has not beencopied yet to the outstanding virtual memory snapshot 242, then emulator134 can withhold the write request or operation from being appliedbefore the block is copied (e.g., enqueue the write request to a queuefor operations on hold, trap the write request). Snapshottingapplication 132 can remove the identifier for the block from normalqueue 420 and enqueue the identifier for the block into priority queue430. For example, as shown in FIG. 4C, a write request has been receivedfor block 107, which has not been copied to the outstanding virtualmemory snapshot 242 yet. In response to the write request, snapshottingapplication 132 removes the identifier for block 107 from normal queue420 and enqueues the identifier for block 107 into priority queue 430.

When priority queue 430 is non-empty, snapshotting application 132 canpause copying blocks in normal queue 420 to the outstanding virtualmemory snapshot 242, and instead copy blocks in priority queue 430 tothe outstanding virtual memory snapshot 242 until priority queue 430 isempty again. Accordingly, with the identifier for block 107 in priorityqueue 430, snapshotting application 132 can proceed to copy block 107according to priority queue 430. As shown in FIG. 4D, after block 107 iscopied to the outstanding virtual memory snapshot 242, the identifierfor block 107 is no longer in priority queue 430, block 107 has a copiedstatus, and the permissions of block 107 are restored. With block 107copied and its permissions restored, the write request for block 107 canbe applied. For example, in FIG. 4E, table 400 indicates a post-writeblock 107′ after the write is applied, and that pre-write block 107 hasbeen copied to virtual memory snapshot 242. Further, if priority queue430 is empty, snapshotting application 132 can resume copying blocksaccording to normal queue 420. As shown in FIG. 4E, after resumption ofcopying of blocks according to normal queue 420, the identifier forblock 106 has been dequeued from normal queue 420 and block 106 has beencopied to virtual memory snapshot 242.

It should be appreciated that the snapshot progress information andcorresponding data structures illustrated in FIGS. 4A-4E are merelyexemplary, and the progress of a snapshot operation can be tracked ormonitored with more or less, and/or different, data structures thatthose shown in FIGS. 4A-4E. For example, there need not be an actualtable 400; the information in table 400 can be tracked indirectly basedon whether an identifier for a block has been queued in normal queue 420or priority queue 430 for copying to an outstanding virtual memorysnapshot 242.

FIGS. 5A-5B include another flow chart of method steps for taking asnapshot of a virtual memory of a virtual machine, according to variousembodiments of the present disclosure. Although the method steps aredescribed with respect to the systems of FIGS. 1-2, 4A-4E, and FIGS.10A-11, persons skilled in the art will understand that any systemconfigured to perform the method steps, in any order, falls within thescope of the various embodiments.

As shown, a method 500 begins at step 502, where a snapshottingapplication 132 allocates one or more blocks in persistent memory 140.Step 502 is similar to step 302 in method 300.

At step 504, the snapshotting application restricts permissions of thevirtual memory. Snapshotting application 132 modifies or otherwiserestricts the permissions of virtual memory 122 to read-only permissionsor other similar permission where modification is prohibited. In someembodiments, snapshotting application 132 modifies or restrictsper-portion (e.g., per-block or per-page) permissions of virtual memory122. If a permission for virtual memory 122 or for a given portiontherein is already read-only, then the permission can be left unchanged.The snapshotting application can also save the permissions that were ineffect prior to the modification or restriction of the permissions, sothat the prior permissions can be restored later (e.g., in step 512).Those saved permissions can be stored in a memory or storage medium(e.g., in primary memory 110, persistent memory 140, and/or in storage150). In some embodiments, snapshotting application 132 restricts thepermissions by modifying permission parameters recorded in extended pagetables (EPTs) associated with virtual memory 122 (e.g., marking virtualmemory 122 or portions thereof as recorded in the EPTs as read-only).For example, as shown in FIG. 4B, permission column 474 of the blocks ofvirtual memory 122 in table 470 are modified to read-only.

At step 506, snapshotting application 132 determines whether a requestto write data to a portion of virtual memory 122 whose permission isrestricted (and thus has yet to be copied to the snapshot file) has beenreceived from virtual machine software 128. If such a write request hasnot been received, then method 500 proceeds (506—No) to step 510, wheresnapshotting application 132 copies a portion of the virtual memory tothe one or more blocks in the persistent memory allocated to virtualmemory snapshot 242. For example, the identifier for the block could bedequeued from normal queue 420, and the block copied to virtual memorysnapshot 242. Step 510 is similar to step 306 in method 300, where instep 510 specifically a portion of virtual memory 122 is being copied tovirtual memory snapshot 242 in persistent memory 140. The portion beingcopied can be a block of virtual memory 122. In method 500, virtualmemory 122 is copied portion-by-portion (e.g., block-by-block) to thesnapshot file in persistent memory 140. The identifiers for the portionsof virtual memory 122 can be queued for copying in a predefined order(e.g., an address order from lowest address to highest, or vice versa)or in a random order. In some embodiments, snapshotting application 132copies the portion of virtual memory 122 as stored in PM portion 212into persistent memory 140.

For example, as shown in FIGS. 4B-4C, identifiers for blocks of virtualmemory 122 are queued in normal queue 420 for copying the correspondingblocks to virtual memory snapshot 242. No writes were received forblocks 0-105, and thus those blocks were copied virtual memory snapshot242 in their queue order in normal queue 420.

At step 506, if such a write request has been received (e.g., trapped orotherwise withheld by hypervisor 130 as a fault due to the restrictedpermissions set in step 504), then method 500 proceeds (506—Yes) to step508, where snapshotting application 132 prioritizes, for copying to theone or more allocated blocks in persistent memory, the restrictedportion of virtual memory 122 to which the write request will writedata. Hypervisor 130 can trap the write request and hold the write frombeing applied to the portion of virtual memory 122, and snapshottingapplication 132 can change the priority of the portion for copying tothe highest priority (e.g., by making the portion the next to be copied,by placing the portion in a queue of portions to copied before otherportions of virtual memory 122). In some embodiments, the restrictedportion is prioritized by putting the portion in a priority queue forhigher-priority copying. For example, in FIG. 4C, block 107, for which awrite has been received, the identifier for block 107 is removed fromnormal queue 420 and enqueued into priority queue 430. Method 500 thenproceeds to step 510, where snapshotting application 132 copies aportion of the virtual memory to the one or more allocated blocks in thepersistent memory. As with step 510 following step 506-No describedabove, snapshotting application 132 copies a portion of virtual memory122, as stored in primary memory 110, into virtual memory snapshot 242in persistent memory 140, with the difference here being that theportion being copied is the portion that is prioritized (e.g., queued inpriority queue 430) in step 508 as opposed to the next portion to becopied based on the predefined order (e.g., according to normal queue420).

At step 512, the snapshotting application restores the permissions ofthe copied portion of the virtual memory. For whatever portion ofvirtual memory 122 was copied to the snapshot file in persistent memory140, snapshotting application 132 restores the permissions of theportion that was in effect prior to the restrictions placed on theportion in step 504. In some embodiments, snapshotting application 132restores the permissions by modifying a permission recorded in an EPTassociated with the portion to the previous, pre-restriction permission.For example, in table 470, the permissions of copied blocks are restoredto read-write, where those permissions had been changed to read-only forthe snapshot operation.

At step 514, the snapshotting application applies the requested write tothe portion of the virtual memory, if a write was determined to bereceived in step 506. If a write request was received for the portionthat was copied, then the write can be applied to the portion in virtualmemory 122. For example, as shown in FIGS. 4C-4E, after block 107 iscopied to virtual memory snapshot 242 and the permission for block 107is restored to the prior permission, the write is applied to modifyblock 107 to block 107′. If a write request for the portion was notreceived, then method 500 can proceed to step 516.

At step 516, snapshotting application 132 determines whether the virtualmemory has been fully copied to the persistent memory. If virtual memory122 has been fully copied to virtual memory snapshot 242 in persistentmemory 140, then method 500 ends (516—Yes). If there are portions ofvirtual memory 122 that has yet to be copied to the snapshot file inpersistent memory 140, then method 500 proceeds (516—No) back to step506. In some embodiments, after method 500 ends, virtual memory snapshot242 in persistent memory 140 can be moved to storage 150 as virtualmemory snapshot 252. After virtual memory snapshot 242 is moved tostorage 150, snapshotting application 132 can deallocate the allocatedblocks in persistent memory 140 holding virtual memory snapshot 242.Those deallocated blocks can then be reallocated (e.g., to a subsequentvirtual memory snapshot 242). Method 500 can be performed again to takea new snapshot of virtual memory 122.

FIGS. 6A-6D are diagrams illustrating another example snapshot progressinformation associated with a snapshot operation, according to variousembodiments of the present disclosure. FIGS. 6A-6D illustrate anothertechnique for taking a snapshot of virtual memory 122 that forgoespausing virtual machine 120.

In FIG. 6A, a table 600 includes columns 602, 604, and 606. Column 602indicates identifiers of addressable unit portions (e.g., addressableblocks, addressable pages, memory addresses) of virtual memory 122 (or,more particularly, identifiers of unit portions of PM portion 212serving as virtual memory 122). As shown, virtual memory 122 includes Nblocks, numbered from 0 to N−1 in FIGS. 6A-6D for simplicity and ease ofunderstanding. In various implementations, the identifiers can be theaddresses of the unit portions. Column 604 indicates a snapshot statusof each block of virtual memory 122. The snapshot status of a blockindicates whether the block has been copied into an outstanding virtualmemory snapshot 242 in persistent memory 140. In some examples, thecopied/not copied status of a block in column 604 can be recorded usinga single bit. Table 600 also includes column 606, which indicateswhether a write has been received for a block in a current snapshotoperation. In some examples, the write received/not received status of ablock in column 606 can be recorded using a single bit.

FIG. 6A corresponds to a state in virtual computing environment 100prior to a snapshot operation (e.g., between snapshot operations). Thatis, a previous virtual memory snapshot 242 had already been completed,and a new virtual memory snapshot 242 has not been started. Accordingly,the blocks have a not-copied status as shown in table 600; none of theblocks have been copied to a new virtual memory snapshot 242 yet. Also,none of the blocks have an outstanding write request or operation duringa snapshot operation. Snapshotting application 132 can initiate asnapshot operation to copy the blocks of virtual memory 122 to virtualmemory snapshot 242 block-by-block. The blocks can be copied in apredefined order (e.g., ascending or descending address or identifierorder) or in a random order. In some embodiments, the copying order canbe managed via a queue similar to normal queue 420.

FIG. 6B corresponds to a state in virtual computing environment 100during a snapshot operation that snapshotting application 132 has begun,and some blocks have been copied to virtual memory snapshot 242. As withFIGS. 4A-4E, snapshotting application 132 can copy virtual memory 122 tovirtual memory snapshot 242 block-by-block. Different from FIGS. 4A-4E,however, is that the permissions of the blocks need not be changed to aread-only permission. As shown in FIG. 6B, for example, blocks 0-105have been copied without any outstanding write request or operations forthose blocks received during the snapshot operation. After those blockshave been copied, any write requests or operations for those copiedblocks do not affect the consistency of the outstanding virtual memorysnapshot 242, and thus those write requests or operations can be appliedto the already-copied blocks. Accordingly, column 606 for thosealready-copied blocks indicate a not-applicable (N/A) status, indicatingthat the status of whether a write was received for those blocks is nolonger applicable with respect to the consistency of those blocks forthe current snapshot operation.

FIG. 6B also indicates that a write has been received for block 108 thathas not been copied to virtual memory snapshot 242 yet. The write forblock 108 is not applied yet to block 108, but is instead trapped byemulator 134 until after block 108 is copied to virtual memory snapshot242. Snapshotting application 132 can pause copying of blocks based onthe predefined order, and instead proceed to copy block 108out-of-order. Thus, snapshotting application 132, instead of nextcopying block 106 in accordance with the original copying order, nextcopies block 108 to virtual memory snapshot 242 ahead of order. Afterblock 108 is copied to virtual memory snapshot 242, the write request toblock 108 is applied, and snapshotting application 132 can resumecopying blocks based on the original copying order (e.g., in address oridentifier order). FIG. 6C shows block 108 as having been copied and thewrite applied to modify block 108 to block 108′.

In FIG. 6D, snapshotting application 132 has resumed copying blocksbased on the original copying order. Accordingly, snapshottingapplication 132 has copied block 106, which was the first not-copiedblock in the original copying order. The status for block 106 in column604 is shown as copied.

It should be appreciated that the snapshot progress information andcorresponding data structures illustrated in FIGS. 6A-6D are merelyexemplary, and the progress of a snapshot operation can be tracked ormonitored with more or less, and/or different, data structures thatthose shown in FIGS. 6A-6D.

FIG. 7 includes yet another flow chart of method steps for taking asnapshot of a virtual memory of a virtual machine, according to variousembodiments of the present disclosure. Although the method steps aredescribed with respect to the systems of FIGS. 1-2, 6A-6D, and FIGS.10A-11, persons skilled in the art will understand that any systemconfigured to perform the method steps, in any order, falls within thescope of the various embodiments.

As shown, a method 700 begins at step 702, where a snapshottingapplication 132 allocates one or more blocks in persistent memory 140.Step 702 is similar to step 302 in method 300 or step 502 in method 500.

At step 706, snapshotting application 132 determines whether a requestto write data to a portion of virtual memory 122 that has yet to becopied to the snapshot file has been received. Step 706 is similar tostep 506 in method 500, with a difference being that, in method 700,permissions for virtual memory 122 and/or for portions thereof have notbeen restricted or otherwise modified as in step 504 in method 500. Insome embodiments, snapshotting application 132, in conjunction withemulator 134, traps a received write request to a portion of virtualmemory 122 that is not yet copied to virtual memory snapshot 242. Ifsuch a write request has not been received, then method 700 proceeds(706—No) to step 708, where snapshotting application 132 copies aportion of the virtual memory to virtual memory snapshot 242 inpersistent memory 140. The portion that is copied can be a next portionaccording to a predefined order (e.g., address order). For example, FIG.6B shows blocks 0-105 having a copied status, where those blocks hadbeen copied to virtual memory snapshot 242 in the predefined order. Step708 is similar to step 510 in method 500, in that a portion of virtualmemory 122 is being copied to virtual memory snapshot 242 in persistentmemory 140. Method 700 then proceeds to step 716.

At step 706, if such a write request has been received, then method 700proceeds (706—Yes) to step 710, where snapshotting application 132copies the portion of virtual memory 122 for which the write isrequested to the one or more allocated blocks in the persistent memory.Hypervisor 130 (e.g., emulator 134) can trap the write and hold thewrite from being applied to the portion of virtual memory 122, andsnapshotting application 132 can change the order of copying to put theportion for which the write request is received ahead of the remainingportions to be copied. In some embodiments, snapshotting application 132pauses copying according to the original order (e.g., the predefinedorder) and proceeds to copy the portion for which the write is receivedahead of order. For example, as shown in FIGS. 6B-6C, block 108, forwhich a write is received, is copied to virtual memory snapshot 242ahead of block 106, even though block 106 would have been copied beforeblock 108 in the original order. Snapshotting application 132 copiesthat portion to the allocated blocks (e.g., to virtual memory snapshot242) in persistent memory 140. Accordingly, the copying aspect of step710 is similar to that of step 708, with a difference being that in step710 the portion for which the write request is received is copied out ofthe predefined order and ahead of remaining portions to be copied. Fromstep 710, method 700 then proceeds to step 714.

In some embodiments, hypervisor 130 (e.g., emulator 134) traps writes byremoving page tables (e.g., EPT 270) or otherwise disabling access topage tables or revoking permissions to access virtual memory 122.

At step 714, the snapshotting application applies the writecorresponding to the write request that was determined to be received instep 706 to the portion of the virtual memory. Step 714 is similar tostep 514 in method 500.

At step 716, snapshotting application 132 determines whether the virtualmemory has been fully copied to the persistent memory. If virtual memory122 has been fully copied to virtual memory snapshot 242 in persistentmemory 140 (e.g., data at each memory address in virtual memory 122 hasbeen copied), then method 700 ends. If there are portions of virtualmemory 122 that has yet to be copied to the snapshot file in persistentmemory 140, then method 700 proceeds back to step 706.

In some embodiments, after method 700 ends, virtual memory snapshot 242in persistent memory 140 can be moved to storage 150 as virtual memorysnapshot copy 252. After to virtual memory snapshot 242 is moved tostorage 150, snapshotting application 132 can deallocate the allocatedblocks in persistent memory 140 holding to virtual memory snapshot 242.Those deallocated blocks can then be reallocated (e.g., to a subsequentto virtual memory snapshot 242). Method 700 can be performed again totake a new snapshot of virtual memory 122.

FIGS. 8A-8D are diagrams illustrating another example snapshot progressinformation associated with a snapshot operation, according to variousembodiments of the present disclosure. FIGS. 8A-8D illustrate anothertechnique for taking a snapshot of virtual memory 122 that forgoespausing virtual machine 120.

In FIG. 8A, a table 800 includes columns 802, 804, and 806. Column 802indicates identifiers of addressable unit portions (e.g., addressableblocks, addressable pages, memory addresses) of virtual memory 122 (or,more particularly, identifiers of unit portions of PM portion 212serving as virtual memory 122). As shown, virtual memory 122 includes Nblocks, numbered from 0 to N−1 in FIGS. 8A-8D for simplicity and ease ofunderstanding. In various implementations, the identifiers can be theaddresses of the unit portions. Column 804 indicates an annotation ofeach block. In some embodiments, the annotation can be a single bit(e.g., a bit flag) of value 0 or 1. The annotation can be used to tracka snapshot status of the corresponding block of virtual memory 122,details of which are described below. Table 800 also includes column806, which indicates whether a write has been received for a block in acurrent snapshot operation. In some embodiments, column 806 is optionaland can be omitted.

FIG. 8A corresponds to a state in virtual computing environment 100 inwhich snapshotting application 132 has initiated a snapshot operationbut before copying any block. When initiating the snapshot operation,snapshotting application 132 annotates each block by setting theannotation of each block to 1, as shown in FIG. 8A. The value of 1 forthe annotation indicates that the block is to be copied in the currentsnapshot operation to the outstanding virtual memory snapshot 242. Also,none of the blocks have an outstanding write request or operation yet inthe current snapshot operation, as shown in FIG. 8A. Snapshottingapplication 132 can proceed to copy the blocks of virtual memory 122 tovirtual memory snapshot 242 block-by-block. The blocks can be copied ina predefined order (e.g., ascending or descending address or identifierorder) or in a random order.

FIG. 8B corresponds to a state in virtual computing environment 100during a snapshot operation that snapshotting application 132 has begun,and some blocks have been copied to virtual memory snapshot 242. As withFIGS. 4A-4E or 6A-6D, snapshotting application 132 can copy virtualmemory 122 to virtual memory snapshot 242 block-by-block. In particular,snapshotting application 132 copies blocks whose annotation is 1 tovirtual memory snapshot 242. Different from FIGS. 4A-4E, however, isthat the permissions of the blocks need not be changed to a read-onlypermission. As shown in FIG. 8B, for example, blocks 0-105 have beencopied. For each copied block, snapshotting application 132 un-annotatesthe block by resetting the annotation of the copied block to 0.Accordingly, the annotation for blocks 0-105 is 0 as shown in column 804in FIG. 8B.

Also shown in FIG. 8B is that respective write requests have beenreceived for blocks 105 and 107 in the current snapshot operation. Thewrite request for block 105 is received after block 105 was alreadycopied and its annotation was reset to 0, and the write request forblock 107 is received before block 107 is copied (and accordingly whileits annotation is still 1).

The write requests are applied without regard to whether the block hasbeen copied or not. Accordingly, FIG. 6C shows the table 800 after thewrite requests are applied to blocks 105 and 107 respectively. Table 800now shows blocks 105′ and 107′ to indicate that those blocks have beenmodified by respective write requests. If a write request is applied toa block whose annotation is 1, then that block can be copied to virtualmemory snapshot 242, and its annotation reset to 0, in the normal orderfor that block; the block as modified by the write request is copied tovirtual memory snapshot 242. If a write request is applied to a blockwhose annotation is 0, then the annotation for that block is set to 1again after the write request is applied, so that the block can becopied to virtual memory snapshot 242 again during a later portion ofthe current snapshot operation. Accordingly, FIG. 8C shows theannotation for block 105′ set to 1, and the annotation for block 107′set to 0, indicating that block 107′ has been copied to virtual memorysnapshot 242 after the write to block 107 and that block 105′ is to becopied to virtual memory snapshot 242 to update the copy of block 105 invirtual memory snapshot 242 due to the write to block 105 after block105 was copied to virtual memory snapshot 242. FIG. 8C also shows thatblocks 106 and 108 thru N−1 have been copied and their annotations resetto 0. Accordingly, FIG. 8C shows that blocks 0-105, 106, 107′, and108-(N−1) have been copied to virtual memory snapshot 242, and block105′ is to be copied to virtual memory snapshot 242 in a later portionof the current snapshot operation.

FIG. 8D shows table 800 after snapshotting application 132 has copiedblock 105′ to virtual memory snapshot 242. The annotation of block 105′has been reset to 0. With the annotations of blocks 0-(N−1) being 0, thesnapshot operation for the outstanding virtual memory snapshot 242 iscomplete.

It should be appreciated that the snapshot progress information andcorresponding data structures illustrated in FIGS. 8A-8D are merelyexemplary, and the progress of a snapshot operation can be tracked ormonitored with more or less, and/or different, data structures thatthose shown in FIGS. 8A-8D.

FIGS. 9A-9B include yet another a further flow chart of method steps fortaking a snapshot of a virtual memory of a virtual machine, according tovarious embodiments of the present disclosure. Although the method stepsare described with respect to the systems of FIGS. 1-2 and FIGS. 10A-11,persons skilled in the art will understand that any system configured toperform the method steps, in any order, falls within the scope of thevarious embodiments.

As shown, a method 900 begins at step 902, where a snapshottingapplication 132 allocates one or more blocks in persistent memory 140.Step 902 is similar to step 302 in method 300, step 502 in method 500,or step 702 in method 700.

At step 904, the snapshotting application annotates portions of virtualmemory 122 for copying. Snapshotting application 132 annotates eachportion (e.g., each memory address) of virtual memory as a portion to becopied to virtual memory snapshot 242 in persistent memory 140. Forexample, as shown in FIG. 8A, the annotations for the blocks in table800 are set to 1. In some embodiments, the annotation includes anindication of a pass of the current snapshot operation in which theportion is to be copied. Accordingly, for a first pass to take asnapshot of virtual memory 122, each portion of virtual memory 122 canbe annotated for the first pass and are copied in the first pass. Asdescribed below, portions to which writes are applied can be annotatedfor copying in a second or subsequent pass.

At step 906, snapshotting application 132 copies an annotated portion ofthe virtual memory to the one or more allocated blocks in the persistentmemory. Step 906 is similar to step 510 or 708 described above.Snapshotting application 132 can copy a portion that is annotation forthe current pass (e.g., a block whose annotation is currently set to 1).In some embodiments, snapshotting application 132 copies annotatedportions in a predefined order (e.g., in ascending or descending addressor identifier order). For example, as described above with reference toFIG. 8B, snapshotting application 132 has so far copied blocks 0-105according to a predefined order.

At step 907, snapshotting application 132 un-annotates the portion thatis copied in step 906. With the portion being un-annotated, the portionneed not be copied again in subsequent passes unless the portion iswritten to after being copied, as described below. For example, as shownin FIG. 8B, the annotations of copied blocks 0-105 have been reset to 0.

At step 908, snapshotting application 132 determines whether a requestto write data to a portion of virtual memory 122 has been received fromvirtual machine software 128. If such a write request has not beenreceived, then method 900 proceeds (908—No) to step 914.

At step 908, if such a write request has been received, then method 900proceeds (908—Yes) to step 910, where snapshotting application 132applies the write to the portion of virtual memory 122 for which thewrite request is received. For example, as shown in FIG. 8B-8C, writerequests have been received for blocks 105 and 107 and have been appliedto those blocks, modifying those blocks to blocks 105′ and 107′,respectively.

At step 912, snapshotting application 132 annotates the portion to whichthe write was applied for copying to virtual memory snapshot 242 in anext pass in the snapshot operation, if the portion is not alreadyannotated. The annotation is similar to that made in step 904, with theannotation made in step 912 indicating that the portion is to be copiedto persistent memory 140 again in a next pass. For example, as shown inFIG. 8C, the annotation for block 105′ has been set to 1, indicatingthat block 105′ is to be copied to virtual memory snapshot 242. If theportion is already annotated (e.g., the portion still has the annotationmade in step 904, or the portion has an annotation resulting from aprior write), the portion need not be annotated again. For example, asshown in FIG. 8B, the annotation for block 107′ remains at 1 becauseblock 107′ has yet to be copied to virtual memory snapshot 242 when thewrite request was applied. Method 900 then proceeds to step 914.

At step 914, snapshotting application 132 determines whether there areany portions annotated for the current pass that remain to be copied. Ifthere are portions annotated for the current pass that has yet to becopied in the current pass, then method 900 proceeds (914—Yes) back tostep 906, where snapshotting application 132 can copy another portionannotated for the current pass. For example, in FIGS. 8B-8C, afterwriting block 107′, snapshotting application 132 further proceeds tocopy blocks 108-(N−1) in order.

If there are no more portions annotated for copying in the current passand has yet to be copied in the current pass, then method 900 proceeds(914—No) to step 916, where snapshotting application 132 determineswhether there are any portions annotated (e.g., blocks whose annotationis set again to 1) for a next pass. If there are no portions annotatedfor a next pass, such as is shown in FIG. 8D, then method 900 ends(916—No). If there are one or more portions annotated for the next pass,then method 900 proceeds (916—Yes) to step 906 and advances to the nextpass, where snapshotting application 132 can copy a portion annotatedfor the next, now current, pass to virtual memory snapshot 242 inpersistent memory 140. For example, as shown in FIG. 8C, snapshottingapplication 132 can proceed with another pass to copy block 105′, whoseannotation was set again to 1, to virtual memory snapshot 242 in thatanother pass.

In some embodiments, after method 900 ends, virtual memory snapshot 242in persistent memory 140 can be moved to storage 150 as virtual memorysnapshot copy 252. After virtual memory snapshot 242 is moved to storage150, snapshotting application 132 can deallocate the allocated blocks inpersistent memory 140 holding virtual memory snapshot 242. Thosedeallocated blocks can then be reallocated (e.g., to a subsequentvirtual memory snapshot 242). Method 900 can be performed again to takea new snapshot of virtual memory 122.

As described above, methods 500, 700, or 900 can be performed to take asnapshot of virtual memory 122 without pausing virtual machine 120. Inmethods 500 or 700, the snapshot is a pre-write snapshot. That is, writerequests or operations received after the snapshot operation has beeninitiated (e.g., received while the snapshot operation is in progress)are not accounted for in the outstanding snapshot, but is accounted forin the subsequent snapshot. The data captured in the snapshot isconsistent with the contents of virtual memory 122 prior to initiationof the snapshot operation. In method 900, the snapshot is a post-writesnapshot. That is, write requests or operations received after thesnapshot operation has been initiated (e.g., received while the snapshotoperation is in progress) are accounted for in the outstanding snapshot.The data captured in the snapshot is consistent with the contents ofvirtual memory 122 at completion of the snapshot operation.

Exemplary Virtualization System Architectures

According to some embodiments, all or portions of any of the foregoingtechniques described with respect to FIGS. 1-9B can be partitioned intoone or more modules and instanced within, or as, or in conjunction witha virtualized controller in a virtual computing environment. Someexample instances within various virtual computing environments areshown and discussed in further detail in FIGS. 10A-10D. Consistent withthese embodiments, a virtualized controller includes a collection ofsoftware instructions that serve to abstract details of underlyinghardware or software components from one or more higher-level processingentities. In some embodiments, a virtualized controller can beimplemented as a virtual machine, as an executable container, or withina layer (e.g., such as a layer in a hypervisor). Consistent with theseembodiments, distributed systems include collections of interconnectedcomponents that are designed for, or dedicated to, storage operations aswell as being designed for, or dedicated to, computing and/or networkingoperations.

In some embodiments, interconnected components in a distributed systemcan operate cooperatively to achieve a particular objective such as toprovide high-performance computing, high-performance networkingcapabilities, and/or high-performance storage and/or high-capacitystorage capabilities. For example, a first set of components of adistributed computing system can coordinate to efficiently use a set ofcomputational or compute resources, while a second set of components ofthe same distributed computing system can coordinate to efficiently usethe same or a different set of data storage facilities.

In some embodiments, a hyperconverged system coordinates the efficientuse of compute and storage resources by and between the components ofthe distributed system. Adding a hyperconverged unit to a hyperconvergedsystem expands the system in multiple dimensions. As an example, addinga hyperconverged unit to a hyperconverged system can expand the systemin the dimension of storage capacity while concurrently expanding thesystem in the dimension of computing capacity and also in the dimensionof networking bandwidth. Components of any of the foregoing distributedsystems can comprise physically and/or logically distributed autonomousentities.

In some embodiments, physical and/or logical collections of suchautonomous entities can sometimes be referred to as nodes. In somehyperconverged systems, compute and storage resources can be integratedinto a unit of a node. Multiple nodes can be interrelated into an arrayof nodes, which nodes can be grouped into physical groupings (e.g.,arrays) and/or into logical groupings or topologies of nodes (e.g.,spoke-and-wheel topologies, rings, etc.). Some hyperconverged systemsimplement certain aspects of virtualization. For example, in ahypervisor-assisted virtualization environment, certain of theautonomous entities of a distributed system can be implemented asvirtual machines. As another example, in some virtualizationenvironments, autonomous entities of a distributed system can beimplemented as executable containers. In some systems and/orenvironments, hypervisor-assisted virtualization techniques andoperating system virtualization techniques are combined.

FIG. 10A is a block diagram illustrating virtualization systemarchitecture 10A00 configured to implement one or more aspects of thepresent embodiments. As shown in FIG. 10A, virtualization systemarchitecture 10A00 includes a collection of interconnected components,including a controller virtual machine (CVM) instance 1030 in aconfiguration 1051. Configuration 1051 includes a computing platform1006 that supports virtual machine instances that are deployed as uservirtual machines, or controller virtual machines or both. Such virtualmachines interface with a hypervisor (as shown). In some examples,virtual machines may include processing of storage I/O (input/output orIO) as received from any or every source within the computing platform.An example implementation of such a virtual machine that processesstorage I/O is depicted as CVM instance 1030.

In this and other configurations, a CVM instance receives block I/Ostorage requests as network file system (NFS) requests in the form ofNFS requests 1002, internet small computer storage interface (iSCSI)block IO requests in the form of iSCSI requests 1003, Samba file system(SMB) requests in the form of SMB requests 1004, and/or the like. TheCVM instance publishes and responds to an internet protocol (IP) address(e.g., CVM IP address 1010). Various forms of input and output can behandled by one or more IO control handler functions (e.g., IOCTL handlerfunctions 1008) that interface to other functions such as data IOmanager functions 1014 and/or metadata manager functions 1022. As shown,the data IO manager functions can include communication with virtualdisk configuration manager 1012 and/or can include direct or indirectcommunication with any of various block IO functions (e.g., NFS IO,iSCSI IO, SMB IO, etc.).

In addition to block IO functions, configuration 1051 supports IO of anyform (e.g., block IO, streaming IO, packet-based IO, HTTP traffic, etc.)through either or both of a user interface (UI) handler such as UI IOhandler 1040 and/or through any of a range of application programminginterfaces (APIs), possibly through API IO manager 1045.

Communications link 1015 can be configured to transmit (e.g., send,receive, signal, etc.) any type of communications packets comprising anyorganization of data items. The data items can comprise a payload data,a destination address (e.g., a destination IP address) and a sourceaddress (e.g., a source IP address), and can include various packetprocessing techniques (e.g., tunneling), encodings (e.g., encryption),formatting of bit fields into fixed-length blocks or into variablelength fields used to populate the payload, and/or the like. In somecases, packet characteristics include a version identifier, a packet orpayload length, a traffic class, a flow label, etc. In some cases, thepayload comprises a data structure that is encoded and/or formatted tofit into byte or word boundaries of the packet.

In some embodiments, hard-wired circuitry may be used in place of, or incombination with, software instructions to implement aspects of thedisclosure. Thus, embodiments of the disclosure are not limited to anyspecific combination of hardware circuitry and/or software. Inembodiments, the term “logic” shall mean any combination of software orhardware that is used to implement all or part of the disclosure.

Computing platform 1006 include one or more computer readable media thatis capable of providing instructions to a data processor for execution.In some examples, each of the computer readable media may take manyforms including, but not limited to, non-volatile media and volatilemedia. Non-volatile media includes any non-volatile storage medium, forexample, solid state storage devices (SSDs) or optical or magnetic diskssuch as hard disk drives (HDDs) or hybrid disk drives, or random accesspersistent memories (RAPMs) or optical or magnetic media drives such aspaper tape or magnetic tape drives. Volatile media includes dynamicmemory such as random access memory (RAM). As shown, controller virtualmachine instance 1030 includes content cache manager facility 1016 thataccesses storage locations, possibly including local dynamic randomaccess memory (DRAM) (e.g., through local memory device access block1018) and/or possibly including accesses to local solid state storage(e.g., through local SSD device access block 1020).

Common forms of computer readable media include any non-transitorycomputer readable medium, for example, floppy disk, flexible disk, harddisk, magnetic tape, or any other magnetic medium; CD-ROM or any otheroptical medium; punch cards, paper tape, or any other physical mediumwith patterns of holes; or any RAM, PROM, EPROM, FLASH-EPROM, or anyother memory chip or cartridge. Any data can be stored, for example, inany form of data repository 1031, which in turn can be formatted intoany one or more storage areas, and which can comprise parameterizedstorage accessible by a key (e.g., a filename, a table name, a blockaddress, an offset address, etc.). Data repository 1031 can store anyforms of data, and may comprise a storage area dedicated to storage ofmetadata pertaining to the stored forms of data. In some cases, metadatacan be divided into portions. Such portions and/or cache copies can bestored in the storage data repository and/or in a local storage area(e.g., in local DRAM areas and/or in local SSD areas). Such localstorage can be accessed using functions provided by local metadatastorage access block 1024. The data repository 1031 can be configuredusing CVM virtual disk controller 1026, which can in turn manage anynumber or any configuration of virtual disks.

Execution of a sequence of instructions to practice certain of thedisclosed embodiments is performed by one or more instances of asoftware instruction processor, or a processing element such as a dataprocessor, or such as a central processing unit (e.g., CPU1, CPU2,CPUN). According to certain embodiments of the disclosure, two or moreinstances of configuration 1051 can be coupled by communications link1015 (e.g., backplane, LAN, PSTN, wired or wireless network, etc.) andeach instance may perform respective portions of sequences ofinstructions as may be required to practice embodiments of thedisclosure.

The shown computing platform 1006 is interconnected to the Internet 1048through one or more network interface ports (e.g., network interfaceport 1023 ₁ and network interface port 1023 ₂). Configuration 1051 canbe addressed through one or more network interface ports using an IPaddress. Any operational element within computing platform 1006 canperform sending and receiving operations using any of a range of networkprotocols, possibly including network protocols that send and receivepackets (e.g., network protocol packet 1021 ₁ and network protocolpacket 1021 ₂).

Computing platform 1006 may transmit and receive messages that can becomposed of configuration data and/or any other forms of data and/orinstructions organized into a data structure (e.g., communicationspackets). In some cases, the data structure includes programinstructions (e.g., application code) communicated through the Internet1048 and/or through any one or more instances of communications link1015. Received program instructions may be processed and/or executed bya CPU as it is received and/or program instructions may be stored in anyvolatile or non-volatile storage for later execution. Programinstructions can be transmitted via an upload (e.g., an upload from anaccess device over the Internet 1048 to computing platform 1006).Further, program instructions and/or the results of executing programinstructions can be delivered to a particular user via a download (e.g.,a download from computing platform 1006 over the Internet 1048 to anaccess device).

Configuration 1051 is merely one example configuration. Otherconfigurations or partitions can include further data processors, and/ormultiple communications interfaces, and/or multiple storage devices,etc. within a partition. For example, a partition can bound a multi-coreprocessor (e.g., possibly including embedded or collocated memory), or apartition can bound a computing cluster having a plurality of computingelements, any of which computing elements are connected directly orindirectly to a communications link. A first partition can be configuredto communicate to a second partition. A particular first partition and aparticular second partition can be congruent (e.g., in a processingelement array) or can be different (e.g., comprising disjoint sets ofcomponents).

A cluster is often embodied as a collection of computing nodes that cancommunicate between each other through a local area network (e.g., LANor virtual LAN (VLAN)) or a backplane. Some clusters are characterizedby assignment of a particular set of the aforementioned computing nodesto access a shared storage facility that is also configured tocommunicate over the local area network or backplane. In many cases, thephysical bounds of a cluster are defined by a mechanical structure suchas a cabinet or such as a chassis or rack that hosts a finite number ofmounted-in computing units. A computing unit in a rack can take on arole as a server, or as a storage unit, or as a networking unit, or anycombination therefrom. In some cases, a unit in a rack is dedicated toprovisioning of power to other units. In some cases, a unit in a rack isdedicated to environmental conditioning functions such as filtering andmovement of air through the rack and/or temperature control for therack. Racks can be combined to form larger clusters. For example, theLAN of a first rack having a quantity of 32 computing nodes can beinterfaced with the LAN of a second rack having 16 nodes to form atwo-rack cluster of 48 nodes. The former two LANs can be configured assubnets, or can be configured as one VLAN. Multiple clusters cancommunicate between one module to another over a WAN (e.g., whengeographically distal) or a LAN (e.g., when geographically proximal).

In some embodiments, a module can be implemented using any mix of anyportions of memory and any extent of hard-wired circuitry includinghard-wired circuitry embodied as a data processor. Some embodiments of amodule include one or more special-purpose hardware components (e.g.,power control, logic, sensors, transducers, etc.). A data processor canbe organized to execute a processing entity that is configured toexecute as a single process or configured to execute using multipleconcurrent processes to perform work. A processing entity can behardware-based (e.g., involving one or more cores) or software-based,and/or can be formed using a combination of hardware and software thatimplements logic, and/or can carry out computations and/or processingsteps using one or more processes and/or one or more tasks and/or one ormore threads or any combination thereof.

Some embodiments of a module include instructions that are stored in amemory for execution so as to facilitate operational and/or performancecharacteristics pertaining to management of block stores. Variousimplementations of the data repository comprise storage media organizedto hold a series of records and/or data structures.

Further details regarding general approaches to managing datarepositories are described in U.S. Pat. No. 8,601,473 titled“ARCHITECTURE FOR MANAGING I/O AND STORAGE FOR A VIRTUALIZATIONENVIRONMENT”, issued on Dec. 3, 2013, which is hereby incorporated byreference in its entirety.

Further details regarding general approaches to managing and maintainingdata in data repositories are described in U.S. Pat. No. 8,549,518titled “METHOD AND SYSTEM FOR IMPLEMENTING A MAINTENANCE SERVICE FORMANAGING I/O AND STORAGE FOR A VIRTUALIZATION ENVIRONMENT”, issued onOct. 1, 2013, which is hereby incorporated by reference in its entirety.

FIG. 10B depicts a block diagram illustrating another virtualizationsystem architecture 10B00 configured to implement one or more aspects ofthe present embodiments. As shown in FIG. 10B, virtualization systemarchitecture 10B00 includes a collection of interconnected components,including an executable container instance 1050 in a configuration 1052.Configuration 1052 includes a computing platform 1006 that supports anoperating system layer (as shown) that performs addressing functionssuch as providing access to external requestors (e.g., user virtualmachines or other processes) via an IP address (e.g., “P.Q.R.S”, asshown). Providing access to external requestors can include implementingall or portions of a protocol specification (e.g., “http:”) and possiblyhandling port-specific functions. In some embodiments, externalrequestors (e.g., user virtual machines or other processes) rely on theaforementioned addressing functions to access a virtualized controllerfor performing all data storage functions. Furthermore, when data inputor output requests are received from a requestor running on a first nodeare received at the virtualized controller on that first node, then inthe event that the requested data is located on a second node, thevirtualized controller on the first node accesses the requested data byforwarding the request to the virtualized controller running at thesecond node. In some cases, a particular input or output request mightbe forwarded again (e.g., an additional or Nth time) to further nodes.As such, when responding to an input or output request, a firstvirtualized controller on the first node might communicate with a secondvirtualized controller on the second node, which second node has accessto particular storage devices on the second node or, the virtualizedcontroller on the first node may communicate directly with storagedevices on the second node.

The operating system layer can perform port forwarding to any executablecontainer (e.g., executable container instance 1050). An executablecontainer instance can be executed by a processor. Runnable portions ofan executable container instance sometimes derive from an executablecontainer image, which in turn might include all, or portions of any of,a Java archive repository (JAR) and/or its contents, and/or a script orscripts and/or a directory of scripts, and/or a virtual machineconfiguration, and may include any dependencies therefrom. In somecases, a configuration within an executable container might include animage comprising a minimum set of runnable code. Contents of largerlibraries and/or code or data that would not be accessed during runtimeof the executable container instance can be omitted from the largerlibrary to form a smaller library composed of only the code or data thatwould be accessed during runtime of the executable container instance.In some cases, start-up time for an executable container instance can bemuch faster than start-up time for a virtual machine instance, at leastinasmuch as the executable container image might be much smaller than arespective virtual machine instance. Furthermore, start-up time for anexecutable container instance can be much faster than start-up time fora virtual machine instance, at least inasmuch as the executablecontainer image might have many fewer code and/or data initializationsteps to perform than a respective virtual machine instance.

An executable container instance can serve as an instance of anapplication container or as a controller executable container. Anyexecutable container of any sort can be rooted in a directory system andcan be configured to be accessed by file system commands (e.g., “ls” or“ls-a”, etc.). The executable container might optionally includeoperating system components 1078, however such a separate set ofoperating system components need not be provided. As an alternative, anexecutable container can include runnable instance 1058, which is built(e.g., through compilation and linking, or just-in-time compilation,etc.) to include all of the library and OS-like functions needed forexecution of the runnable instance. In some cases, a runnable instancecan be built with a virtual disk configuration manager, any of a varietyof data IO management functions, etc. In some cases, a runnable instanceincludes code for, and access to, container virtual disk controller1076. Such a container virtual disk controller can perform any of thefunctions that the aforementioned CVM virtual disk controller 1026 canperform, yet such a container virtual disk controller does not rely on ahypervisor or any particular operating system so as to perform its rangeof functions.

In some environments, multiple executable containers can be collocatedand/or can share one or more contexts. For example, multiple executablecontainers that share access to a virtual disk can be assembled into apod (e.g., a Kubernetes pod). Pods provide sharing mechanisms (e.g.,when multiple executable containers are amalgamated into the scope of apod) as well as isolation mechanisms (e.g., such that the namespacescope of one pod does not share the namespace scope of another pod).

FIG. 10C is a block diagram illustrating virtualization systemarchitecture 10C00 configured to implement one or more aspects of thepresent embodiments. As shown in FIG. 10C, virtualization systemarchitecture 10C00 includes a collection of interconnected components,including a user executable container instance in configuration 1053that is further described as pertaining to user executable containerinstance 1070. Configuration 1053 includes a daemon layer (as shown)that performs certain functions of an operating system.

User executable container instance 1070 comprises any number of usercontainerized functions (e.g., user containerized function1, usercontainerized function2, . . . , user containerized functionN). Suchuser containerized functions can execute autonomously or can beinterfaced with or wrapped in a runnable object to create a runnableinstance (e.g., runnable instance 1058). In some cases, the shownoperating system components 1078 comprise portions of an operatingsystem, which portions are interfaced with or included in the runnableinstance and/or any user containerized functions. In some embodiments ofa daemon-assisted containerized architecture, computing platform 1006might or might not host operating system components other than operatingsystem components 1078. More specifically, the shown daemon might ormight not host operating system components other than operating systemcomponents 1078 of user executable container instance 1070.

In some embodiments, the virtualization system architecture 10A00,10B00, and/or 10C00 can be used in any combination to implement adistributed platform that contains multiple servers and/or nodes thatmanage multiple tiers of storage where the tiers of storage might beformed using the shown data repository 1031 and/or any forms of networkaccessible storage. As such, the multiple tiers of storage may includestorage that is accessible over communications link 1015. Such networkaccessible storage may include cloud storage or networked storage (e.g.,a SAN or storage area network). Unlike prior approaches, the disclosedembodiments permit local storage that is within or directly attached tothe server or node to be managed as part of a storage pool. Such localstorage can include any combinations of the aforementioned SSDs and/orHDDs and/or RAPMs and/or hybrid disk drives. The address spaces of aplurality of storage devices, including both local storage (e.g., usingnode-internal storage devices) and any forms of network-accessiblestorage, are collected to form a storage pool having a contiguousaddress space.

Significant performance advantages can be gained by allowing thevirtualization system to access and utilize local (e.g., node-internal)storage. This is because I/O performance is typically much faster whenperforming access to local storage as compared to performing access tonetworked storage or cloud storage. This faster performance for locallyattached storage can be increased even further by using certain types ofoptimized local storage devices such as SSDs or RAPMs, or hybrid HDDs,or other types of high-performance storage devices.

In some embodiments, each storage controller exports one or more blockdevices or NFS or iSCSI targets that appear as disks to user virtualmachines or user executable containers. These disks are virtual sincethey are implemented by the software running inside the storagecontrollers. Thus, to the user virtual machines or user executablecontainers, the storage controllers appear to be exporting a clusteredstorage appliance that contains some disks. User data (includingoperating system components) in the user virtual machines resides onthese virtual disks.

In some embodiments, any one or more of the aforementioned virtual diskscan be structured from any one or more of the storage devices in thestorage pool. In some embodiments, a virtual disk is a storageabstraction that is exposed by a controller virtual machine or containerto be used by another virtual machine or container. In some embodiments,the virtual disk is exposed by operation of a storage protocol such asiSCSI or NFS or SMB. In some embodiments, a virtual disk is mountable.In some embodiments, a virtual disk is mounted as a virtual storagedevice.

In some embodiments, some or all of the servers or nodes runvirtualization software. Such virtualization software might include ahypervisor (e.g., as shown in configuration 1051) to manage theinteractions between the underlying hardware and user virtual machinesor containers that run client software.

Distinct from user virtual machines or user executable containers, aspecial controller virtual machine (e.g., as depicted by controllervirtual machine instance 1030) or as a special controller executablecontainer is used to manage certain storage and I/O activities. Such aspecial controller virtual machine is sometimes referred to as acontroller executable container, a service virtual machine (SVM), aservice executable container, or a storage controller. In someembodiments, multiple storage controllers are hosted by multiple nodes.Such storage controllers coordinate within a computing system to form acomputing cluster.

The storage controllers are not formed as part of specificimplementations of hypervisors. Instead, the storage controllers runabove hypervisors on the various nodes and work together to form adistributed system that manages all of the storage resources, includingthe locally attached storage, the networked storage, and the cloudstorage. In example embodiments, the storage controllers run as specialvirtual machines—above the hypervisors—thus, the approach of using suchspecial virtual machines can be used and implemented within any virtualmachine architecture. Furthermore, the storage controllers can be usedin conjunction with any hypervisor from any virtualization vendor and/orimplemented using any combinations or variations of the aforementionedexecutable containers in conjunction with any host operating systemcomponents.

FIG. 10D is a block diagram illustrating virtualization systemarchitecture 10D00 configured to implement one or more aspects of thepresent embodiments. As shown in FIG. 10D, virtualization systemarchitecture 10D00 includes a distributed virtualization system thatincludes multiple clusters (e.g., cluster 1083 ₁, . . . , cluster 1083_(N)) comprising multiple nodes that have multiple tiers of storage in astorage pool. Representative nodes (e.g., node 1081 ₁₁ . . . , node 1081_(1M)) and storage pool 1090 associated with cluster 1083 ₁ are shown.Each node can be associated with one server, multiple servers, orportions of a server. The nodes can be associated (e.g., logicallyand/or physically) with the clusters. As shown, the multiple tiers ofstorage include storage that is accessible through a network 1096, suchas a networked storage 1086 (e.g., a storage area network or SAN,network attached storage or NAS, etc.). The multiple tiers of storagefurther include instances of local storage (e.g., local storage 1091 ₁₁,. . . , local storage 1091 _(1M)). For example, the local storage can bewithin or directly attached to a server and/or appliance associated withthe nodes. Such local storage can include solid state drives (SSD 1093₁₁, . . . , SSD 1093 _(1M)), hard disk drives (HDD 1094 ₁₁, . . . , HDD1094 _(1M)), and/or other storage devices.

As shown, any of the nodes of the distributed virtualization system canimplement one or more user virtualized entities (e.g., VE 1088 ₁₁₁, . .. , VE 1088 _(11K), . . . , VE 1088 _(1M1), . . . , VE 1088 _(1MK)),such as virtual machines (VMs) and/or executable containers. The VMs canbe characterized as software-based computing “machines” implemented in acontainer-based or hypervisor-assisted virtualization environment thatemulates the underlying hardware resources (e.g., CPU, memory, etc.) ofthe nodes. For example, multiple VMs can operate on one physical machine(e.g., node host computer) running a single host operating system (e.g.,host operating system 1087 ₁₁, . . . , host operating system 1087_(1M)), while the VMs run multiple applications on various respectiveguest operating systems. Such flexibility can be facilitated at least inpart by a hypervisor (e.g., hypervisor 1085 ₁₁, . . . , hypervisor 1085_(1M)), which hypervisor is logically located between the various guestoperating systems of the VMs and the host operating system of thephysical infrastructure (e.g., node).

As an alternative, executable containers may be implemented at the nodesin an operating system-based virtualization environment or in acontainerized virtualization environment. The executable containers areimplemented at the nodes in an operating system virtualizationenvironment or container virtualization environment. The executablecontainers can include groups of processes and/or resources (e.g.,memory, CPU, disk, etc.) that are isolated from the node host computerand other containers. Such executable containers directly interface withthe kernel of the host operating system (e.g., host operating system1087 ₁₁, . . . , host operating system 1087 _(1M)) without, in mostcases, a hypervisor layer. This lightweight implementation canfacilitate efficient distribution of certain software components, suchas applications or services (e.g., micro-services). Any node of adistributed virtualization system can implement both ahypervisor-assisted virtualization environment and a containervirtualization environment for various purposes. Also, any node of adistributed virtualization system can implement any one or more types ofthe foregoing virtualized controllers so as to facilitate access tostorage pool 1090 by the VMs and/or the executable containers.

Multiple instances of such virtualized controllers can coordinate withina cluster to form the distributed storage system 1092 which can, amongother operations, manage the storage pool 1090. This architecturefurther facilitates efficient scaling in multiple dimensions (e.g., in adimension of computing power, in a dimension of storage space, in adimension of network bandwidth, etc.).

In some embodiments, a particularly-configured instance of a virtualmachine at a given node can be used as a virtualized controller in ahypervisor-assisted virtualization environment to manage storage and I/O(input/output or IO) activities of any number or form of virtualizedentities. For example, the virtualized entities at node 1081 ₁₁ caninterface with a controller virtual machine (e.g., virtualizedcontroller 1082 ₁₁) through hypervisor 1085 ₁₁ to access data of storagepool 1090. In such cases, the controller virtual machine is not formedas part of specific implementations of a given hypervisor. Instead, thecontroller virtual machine can run as a virtual machine above thehypervisor at the various node host computers. When the controllervirtual machines run above the hypervisors, varying virtual machinearchitectures and/or hypervisors can operate with the distributedstorage system 1092. For example, a hypervisor at one node in thedistributed storage system 1092 might correspond to software from afirst vendor, and a hypervisor at another node in the distributedstorage system 1092 might correspond to a second software vendor. Asanother virtualized controller implementation example, executablecontainers can be used to implement a virtualized controller (e.g.,virtualized controller 1082 _(1M)) in an operating system virtualizationenvironment at a given node. In this case, for example, the virtualizedentities at node 1081 _(1M) can access the storage pool 1090 byinterfacing with a controller container (e.g., virtualized controller1082 _(1M)) through hypervisor 1085 _(1M) and/or the kernel of hostoperating system 1087 _(1M).

In some embodiments, one or more instances of an agent can beimplemented in the distributed storage system 1092 to facilitate theherein disclosed techniques. Specifically, agent 1084 ₁₁ can beimplemented in the virtualized controller 1082 ₁₁, and agent 1084 _(1M)can be implemented in the virtualized controller 1082 _(1M). Suchinstances of the virtualized controller can be implemented in any nodein any cluster. Actions taken by one or more instances of thevirtualized controller can apply to a node (or between nodes), and/or toa cluster (or between clusters), and/or between any resources orsubsystems accessible by the virtualized controller or their agents.

Exemplary Computer System

FIG. 11 is a block diagram illustrating a computer system 1100configured to implement one or more aspects of the present embodiments.In some embodiments, computer system 1100 may be representative of acomputer system for implementing one or more aspects of the embodimentsdisclosed in FIGS. 1-10D. In some embodiments, computer system 1100 is aserver machine operating in a data center or a cloud computingenvironment. suitable for implementing an embodiment of the presentdisclosure. As shown, computer system 1100 includes a bus 1102 or othercommunication mechanism for communicating information, whichinterconnects subsystems and devices, such as one or more processors1104, memory 1106, storage 1108, optional display 1110, one or moreinput/output devices 1112, and a communications interface 1114. Computersystem 1100 described herein is illustrative and any other technicallyfeasible configurations fall within the scope of the present disclosure.

The one or more processors 1104 include any suitable processorsimplemented as a central processing unit (CPU), a graphics processingunit (GPU), an application-specific integrated circuit (ASIC), a fieldprogrammable gate array (FPGA), an artificial intelligence (AI)accelerator, any other type of processor, or a combination of differentprocessors, such as a CPU configured to operate in conjunction with aGPU. In general, the one or more processors 1104 may be any technicallyfeasible hardware unit capable of processing data and/or executingsoftware applications. Further, in the context of this disclosure, thecomputing elements shown in computer system 1100 may correspond to aphysical computing system (e.g., a system in a data center) or may be avirtual computing instance, such as any of the virtual machinesdescribed in FIGS. 10A-10D.

Memory 1106 includes a random access memory (RAM) module, a flash memoryunit, and/or any other type of memory unit or combination thereof. Theone or more processors 1104, and/or communications interface 1114 areconfigured to read data from and write data to memory 1106. Memory 1106includes various software programs that include one or more instructionsthat can be executed by the one or more processors 1104 and applicationdata associated with said software programs.

Storage 1108 includes non-volatile storage for applications and data,and may include one or more fixed or removable disk drives, HDDs, SSD,NVMes, vDisks, flash memory devices, and/or other magnetic, optical,and/or solid state storage devices.

Communications interface 1114 includes hardware and/or software forcoupling computer system 1100 to one or more communication links 1116.The one or more communication links 1115 may include any technicallyfeasible type of communications network that allows data to be exchangedbetween computer system 1100 and external entities or devices, such as aweb server or another networked computing system. For example, the oneor more communication links 1115 may include one or more wide areanetworks (WANs), one or more local area networks (LANs), one or morewireless (WiFi) networks, the Internet, and/or the like.

In sum, a snapshot of a virtual memory of a virtual machine can be takento a persistent memory. The snapshot can be taken with or withoutpausing the virtual machine. In some embodiments, a write request thatis received during the snapshot operation can be restricted from beingapplied until after the associated portion of the virtual memory iscopied to the snapshot. A portion of the virtual memory with anoutstanding write request can be copied to the snapshot ahead of anormal order of copying. In some embodiments, portions of the virtualmemory can be annotated for copying and un-annotated when copied, butcan be re-annotated for copying if a write request is received for theportion during the snapshot operation.

At least one technical advantage of the disclosed techniques relative tothe prior art is that the taking of a snapshot of virtual memory of avirtual machine no longer requires a pause of the VM and/or a suspensionof access. Accordingly, snapshots of the virtual memory can be takenmore often with reduced or minimal impact on the performance of the VMcompared to conventional approaches. More frequent snapshots providemore protection for the consistency of data included in the virtualmachine. These technical advantages provide one or more technologicaladvancements or improvements over prior art approaches.

1. In some embodiments, one or more non-transitory computer-readablemedia store program instructions that, when executed by one or moreprocessors, cause the one or more processors to perform steps ofallocating, in a persistent memory, one or more blocks associated with avirtual memory; detecting a write request associated with a firstportion of the virtual memory; in response to detecting the writerequest associated with the first portion, prioritizing the firstportion; based on the prioritizing, copying the first portion into theone or more blocks in the persistent memory ahead of a second portion ofthe virtual memory; and after copying the first portion: applying thewrite request to the first portion; and copying the second portion intothe one or more blocks in the persistent memory.

2. The one or more non-transitory computer-readable media of clause 1,wherein the steps further comprise, before copying the first portion andthe second portion, setting a permission of the first portion and apermission of the second portion to read-only.

3. The one or more non-transitory computer-readable media of clauses 1or 2, wherein the second portion is ahead of the first portion in acopying order prior to the prioritizing.

4. The one or more non-transitory computer-readable media of any ofclauses 1-3, wherein prioritizing the first portion comprises queueingthe first portion in a priority queue.

5. The one or more non-transitory computer-readable media of any ofclauses 1-4, wherein detecting the write request associated with thefirst portion comprises receiving the write request while a permissionof the first portion is read-only.

6. The one or more non-transitory computer-readable media of any ofclauses 1-5, wherein detecting the write request associated with thefirst portion comprises trapping the write request before copying thefirst portion.

7. The one or more non-transitory computer-readable media of any ofclauses 1-6, wherein the steps further comprise, after copying the firstportion, restoring a permission of the first portion.

8. The one or more non-transitory computer-readable media of any ofclauses 1-7, wherein the second portion is copied according to apredefined order of portions of the virtual memory.

9. The one or more non-transitory computer-readable media of any ofclauses 1-8, wherein the second portion is copied according to anaddress order of portions of the virtual memory.

10. In some embodiments, a method for taking a snapshot of a virtualmemory of a virtual machine comprises allocating, in a persistentmemory, one or more blocks associated with a virtual memory; detecting awrite request associated with a first portion of the virtual memory; inresponse to detecting the write request associated with the firstportion, prioritizing the first portion; based on the prioritizing,copying the first portion into the one or more blocks in the persistentmemory ahead of a second portion of the virtual memory; and aftercopying the first portion: applying the write request to the firstportion; and copying the second portion into the one or more blocks inthe persistent memory.

11. The method of clause 10, further comprising, before copying thefirst portion and the second portion, setting a permission of the firstportion and a permission of the second portion to read-only.

12. The method of clauses 10 or 11, wherein the second portion is aheadof the first portion in a copying order prior to the prioritizing.

13. The method of any of clauses 10-12, wherein prioritizing the firstportion comprises queueing the first portion in a priority queue.

14. The method of any of clauses 10-13, wherein detecting the writerequest associated with the first portion comprises one or more ofreceiving the write request while a permission of the first portion isread-only, or trapping the write request before copying the firstportion.

15. The method of any of clauses 10-14, further comprising, aftercopying the first portion, restoring a permission of the first portion.

16. The method of any of clauses 10-15, wherein the second portion iscopied according to an address order of portions of the virtual memory.

17. In some embodiments, a system comprises a memory storing a set ofinstructions; and one or more processors that, when executing the set ofinstructions, are configured to allocate, in a persistent memory, one ormore blocks associated with a virtual memory; detect a write requestassociated with a first portion of the virtual memory; in response todetecting the write request associated with the first portion,prioritize the first portion; based on the prioritizing, copy the firstportion into the one or more blocks in the persistent memory ahead of asecond portion of the virtual memory; and after copying the firstportion: apply the write request to the first portion; and copy thesecond portion into the one or more blocks in the persistent memory.

18. The system of clause 17, wherein the one or more processors, whenexecuting the set of instructions, are further configured to, beforecopying the first portion and the second portion, set a permission ofthe first portion and a permission of the second portion to read-only.

19. The system of clauses 17 or 18, wherein the second portion is aheadof the first portion in a copying order prior to the prioritizing.

20. The system of any of clauses 17-19, wherein the one or moreprocessors, when executing the set of instructions, are furtherconfigured to queue the first portion in a priority queue.

21. The system of any of clauses 17-20, wherein the one or moreprocessors, when executing the set of instructions, are furtherconfigured to one or more of receive the write request while apermission of the first portion is read-only, or trap the write requestbefore copying the first portion.

22. The system of any of clauses 17-21, wherein the one or moreprocessors, when executing the set of instructions, are furtherconfigured to, after copying the first portion, restore a permission ofthe first portion.

23. The system of any of clauses 17-22, wherein the second portion iscopied according to an address order of portions of the virtual memory.

24. In some embodiments, one or more non-transitory computer-readablemedia store program instructions that, when executed by one or moreprocessors, cause the one or more processors to perform steps ofallocating, in a persistent memory, one or more blocks associated with avirtual memory; annotating a first portion of the virtual memory forcopying in a first pass; copying the first portion into the one or moreblocks in the persistent memory in the first pass; receiving a writerequest associated with the first portion; and in response to receivingthe write request: applying the write request to the first portion; andannotating the first portion for copying in a second pass subsequent tothe first pass.

25. The one or more non-transitory computer-readable media of clause 24,wherein the steps further comprise copying the first portion into theone or more blocks in the persistent memory in the second pass.

26. The one or more non-transitory computer-readable media of clauses 24or 25, wherein the steps further comprise, after the first portion iscopied into the one or more blocks in the persistent memory in thesecond pass, un-annotating the first portion.

27. The one or more non-transitory computer-readable media of any ofclauses 24-26, wherein the steps further comprise un-annotating thefirst portion after copying the first portion in the first pass.

28. The one or more non-transitory computer-readable media of any ofclauses 24-27, wherein the steps further comprise receiving a writerequest associated with a second portion of the virtual memory, whereinsecond portion is annotated for copying in the first pass; and applyingthe write request to the second portion.

29. The one or more non-transitory computer-readable media of any ofclauses 24-28, wherein the steps further comprise, after the secondportion is copied into the one or more blocks in the persistent memory,un-annotating the second portion.

30. The one or more non-transitory computer-readable media of any ofclauses 24-29, wherein the steps further comprise, in response todetermining that at least one portion of the virtual memory isannotated, copying the at least one portion of the virtual memory intothe persistent memory in a subsequent pass.

31. The one or more non-transitory computer-readable media of any ofclauses 24-30, wherein the steps further comprise, in response todetermining that no portion of the virtual memory is annotated, ceasingcopying of the virtual memory into the persistent memory.

32. In some embodiments, a method for taking a snapshot of a virtualmemory of a virtual machine comprises allocating, in a persistentmemory, one or more blocks associated with a virtual memory; annotatinga first portion of the virtual memory for copying in a first pass;copying the first portion into the one or more blocks in the persistentmemory in the first pass; receiving a write request associated with thefirst portion; and in response to receiving the write request: applyingthe write request to the first portion; and annotating the first portionfor copying in a second pass subsequent to the first pass.

33. The method of clause 32, further comprising copying the firstportion into the one or more blocks in the persistent memory in thesecond pass.

34. The method of clauses 32 or 33, further comprising, after the firstportion is copied into the one or more blocks in the persistent memoryin the second pass, un-annotating the first portion.

35. The method of any of clauses 32-34, further comprising un-annotatingthe first portion after copying the first portion in the first pass.

36. The method of any of clauses 32-35, further comprising receiving awrite request associated with a second portion of the virtual memory,wherein second portion is annotated for copying in the first pass; andapplying the write request to the second portion.

37. The method of any of clauses 32-36, further comprising, after thesecond portion is copied into the one or more blocks in the persistentmemory, un-annotating the second portion.

38. The method of any of clauses 32-37, further comprising, in responseto determining that at least one portion of the virtual memory isannotated, copying the at least one portion of the virtual memory intothe persistent memory in a subsequent pass; and in response todetermining that no portion of the virtual memory is annotated, ceasingcopying of the virtual memory into the persistent memory.

39. In some embodiments, a system comprises a memory storing a set ofinstructions; and one or more processors that, when executing the set ofinstructions, are configured to allocate, in a persistent memory, one ormore blocks associated with a virtual memory; annotate a first portionof the virtual memory for copying in a first pass; copy the firstportion into the one or more blocks in the persistent memory in thefirst pass; receive a write request associated with the first portion;and in response to receiving the write request: apply the write requestto the first portion; and annotate the first portion for copying in asecond pass subsequent to the first pass.

40. The system of clause 39, wherein the one or more processors, whenexecuting the set of instructions, are further configured to copy thefirst portion into the one or more blocks in the persistent memory inthe second pass.

41. The system of clauses 39 or 40, wherein the one or more processors,when executing the set of instructions, are further configured to, afterthe first portion is copied into the one or more blocks in thepersistent memory in the second pass, un-annotate the first portion.

42. The system of any of clauses 39-41, wherein the one or moreprocessors, when executing the set of instructions, are furtherconfigured to un-annotate the first portion after copying the firstportion in the first pass.

43. The system of any of clauses 39-42, wherein the one or moreprocessors, when executing the set of instructions, are furtherconfigured to receive a write request associated with a second portionof the virtual memory, wherein second portion is annotated for copyingin the first pass; and apply the write request to the second portion.

44. The system of any of clauses 39-43, wherein the one or moreprocessors, when executing the set of instructions, are furtherconfigured to, after the second portion is copied into the one or moreblocks in the persistent memory, un-annotate the second portion.

45. The system of any of clauses 39-44, wherein the one or moreprocessors, when executing the set of instructions, are furtherconfigured to in response to determining that at least one portion ofthe virtual memory is annotated, copy the at least one portion of thevirtual memory into the persistent memory in a subsequent pass; and inresponse to determining that no portion of the virtual memory isannotated, cease copying of the virtual memory into the persistentmemory.

Any and all combinations of any of the claim elements recited in any ofthe claims and/or any elements described in this application, in anyfashion, fall within the contemplated scope of the present disclosureand protection.

The descriptions of the various embodiments have been presented forpurposes of illustration, but are not intended to be exhaustive orlimited to the embodiments disclosed. Many modifications and variationswill be apparent to those of ordinary skill in the art without departingfrom the scope and spirit of the described embodiments.

Aspects of the present embodiments may be embodied as a system, method,or computer program product. Accordingly, aspects of the presentdisclosure may take the form of an entirely hardware embodiment, anentirely software embodiment (including firmware, resident software,micro-code, etc.) or an embodiment combining software and hardwareaspects that may all generally be referred to herein as a “module,” a“system,” or a “computer.” In addition, any hardware and/or softwaretechnique, process, function, component, engine, module, or systemdescribed in the present disclosure may be implemented as a circuit orset of circuits. Furthermore, aspects of the present disclosure may takethe form of a computer program product embodied in one or more computerreadable medium(s) having computer readable program code embodiedthereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

Aspects of the present disclosure are described above with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of thedisclosure. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine. The instructions, when executed via the processor ofthe computer or other programmable data processing apparatus, enable theimplementation of the functions/acts specified in the flowchart and/orblock diagram block or blocks. Such processors may be, withoutlimitation, general purpose processors, special-purpose processors,application-specific processors, or field-programmable gate arrays.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present disclosure. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

While the preceding is directed to embodiments of the presentdisclosure, other and further embodiments of the disclosure may bedevised without departing from the basic scope thereof, and the scopethereof is determined by the claims that follow.

What is claimed is:
 1. One or more non-transitory computer-readablemedia storing program instructions that, when executed by one or moreprocessors, cause the one or more processors to perform steps of:allocating, in a persistent memory, one or more blocks associated with avirtual memory; annotating a first portion of the virtual memory forcopying in a first pass; copying the first portion into the one or moreblocks in the persistent memory in the first pass; receiving a writerequest associated with the first portion; and in response to receivingthe write request: applying the write request to the first portion; andannotating the first portion for copying in a second pass subsequent tothe first pass.
 2. The one or more non-transitory computer-readablemedia of claim 1, wherein the steps further comprise copying the firstportion into the one or more blocks in the persistent memory in thesecond pass.
 3. The one or more non-transitory computer-readable mediaof claim 1, wherein the steps further comprise, after the first portionis copied into the one or more blocks in the persistent memory in thesecond pass, un-annotating the first portion.
 4. The one or morenon-transitory computer-readable media of claim 1, wherein the stepsfurther comprise un-annotating the first portion after copying the firstportion in the first pass.
 5. The one or more non-transitorycomputer-readable media of claim 1, wherein the steps further comprise:receiving a write request associated with a second portion of thevirtual memory, wherein second portion is annotated for copying in thefirst pass; and applying the write request to the second portion.
 6. Theone or more non-transitory computer-readable media of claim 5, whereinthe steps further comprise, after the second portion is copied into theone or more blocks in the persistent memory, un-annotating the secondportion.
 7. The one or more non-transitory computer-readable media ofclaim 1, wherein the steps further comprise, in response to determiningthat at least one portion of the virtual memory is annotated, copyingthe at least one portion of the virtual memory into the persistentmemory in a subsequent pass.
 8. The one or more non-transitorycomputer-readable media of claim 1, wherein the steps further comprise,in response to determining that no portion of the virtual memory isannotated, ceasing copying of the virtual memory into the persistentmemory.
 9. A method for taking a snapshot of a virtual memory of avirtual machine, comprising: allocating, in a persistent memory, one ormore blocks associated with a virtual memory; annotating a first portionof the virtual memory for copying in a first pass; copying the firstportion into the one or more blocks in the persistent memory in thefirst pass; receiving a write request associated with the first portion;and in response to receiving the write request: applying the writerequest to the first portion; and annotating the first portion forcopying in a second pass subsequent to the first pass.
 10. The method ofclaim 9, further comprising copying the first portion into the one ormore blocks in the persistent memory in the second pass.
 11. The methodof claim 9, further comprising, after the first portion is copied intothe one or more blocks in the persistent memory in the second pass,un-annotating the first portion.
 12. The method of claim 9, furthercomprising un-annotating the first portion after copying the firstportion in the first pass.
 13. The method of claim 9, furthercomprising: receiving a write request associated with a second portionof the virtual memory, wherein second portion is annotated for copyingin the first pass; and applying the write request to the second portion.14. The method of claim 13, further comprising, after the second portionis copied into the one or more blocks in the persistent memory,un-annotating the second portion.
 15. The method of claim 9, furthercomprising: in response to determining that at least one portion of thevirtual memory is annotated, copying the at least one portion of thevirtual memory into the persistent memory in a subsequent pass; and inresponse to determining that no portion of the virtual memory isannotated, ceasing copying of the virtual memory into the persistentmemory.
 16. A system, comprising: a memory storing a set ofinstructions; and one or more processors that, when executing the set ofinstructions, are configured to: allocate, in a persistent memory, oneor more blocks associated with a virtual memory; annotate a firstportion of the virtual memory for copying in a first pass; copy thefirst portion into the one or more blocks in the persistent memory inthe first pass; receive a write request associated with the firstportion; and in response to receiving the write request: apply the writerequest to the first portion; and annotate the first portion for copyingin a second pass subsequent to the first pass.
 17. The system of claim16, wherein the one or more processors, when executing the set ofinstructions, are further configured to copy the first portion into theone or more blocks in the persistent memory in the second pass.
 18. Thesystem of claim 16, wherein the one or more processors, when executingthe set of instructions, are further configured to, after the firstportion is copied into the one or more blocks in the persistent memoryin the second pass, un-annotate the first portion.
 19. The system ofclaim 16, wherein the one or more processors, when executing the set ofinstructions, are further configured to un-annotate the first portionafter copying the first portion in the first pass.
 20. The system ofclaim 16, wherein the one or more processors, when executing the set ofinstructions, are further configured to: receive a write requestassociated with a second portion of the virtual memory, wherein secondportion is annotated for copying in the first pass; and apply the writerequest to the second portion.
 21. The system of claim 20, wherein theone or more processors, when executing the set of instructions, arefurther configured to, after the second portion is copied into the oneor more blocks in the persistent memory, un-annotate the second portion.22. The system of claim 16, wherein the one or more processors, whenexecuting the set of instructions, are further configured to: inresponse to determining that at least one portion of the virtual memoryis annotated, copy the at least one portion of the virtual memory intothe persistent memory in a subsequent pass; and in response todetermining that no portion of the virtual memory is annotated, ceasecopying of the virtual memory into the persistent memory.